Comparing General and Medical Texts for Information Retrieval Based on Natural Language Processing: An Inquiry into Lexical Disambiguation
نویسندگان
چکیده
In this paper we compare two types of corpus, focusing on the lexical ambiguity of each of them. The first corpus consists mainly of general newspaper articles and literature excerpts, while the second belongs to the medical domain. To conduct the study, we have used two different disambiguation tools. First, each tool was validated in its respective application area. We then use these systems in order to assess and compare both the general ambiguity rate and the particularities of each domain. Quantitative results show that medical documents are lexically less ambiguous than unrestricted documents. Our conclusions emphasize the importance of the application area in the design of NLP tools.
منابع مشابه
Evaluating Wordnets in Cross-language Information Retrieval: the ITEM Search Engine
This paper presents the ITEM multilingual search engine. This search engine performs full lexical processing (morphological analysis, tagging and Word Sense Disambiguation) on documents and queries in order to provide language-neutral indexes for querying and retrieval. The indexing terms are the EuroWordNet/ITEM InterLingual Index records that link wordnets in 10 languages of the European Comm...
متن کاملRoget's Thesaurus as a Lexical Resource for Natural Language Processing
WordNet proved that it is possible to construct a large-scale electronic lexical database on the principles of lexical semantics. It has been accepted and used extensively by computational linguists ever since it was released. Some of its applications include information retrieval, language generation, question answering, text categorization, text classification and word sense disambiguation. I...
متن کاملComparing Corpora And Lexical Ambiguity
In this paper we compare two types of corpus, focusing on the lexical mnbiguity of each of them. The first corpns consists mainly of newspaper articles and Hterature excerpts, while the second belc)ngs to the medical domain. To conduct the study, we have used two different disambiguation tools. However, first of all, we must verify the performance of each system in its respective application do...
متن کاملEvaluating Resources for Query Translation in Cross-Language Information Retrieval
Our goal is to evaluate the utility of a lexical resource containing Lexical Conceptual Structures LCS for use in cross language information retrieval Our evaluation makes use of a combination of techniques from interlingual machine translation Dorr with conventional information retrieval techniques Oard OardandDorr Given a query in one language we transform the query into the corresponding ter...
متن کاملWord Sense Clustering Based on Translation Equivalence in Parallel Texts; a Case Study in Romanian
The lexical ambiguity is one of the most difficult problem to solve in the natural language processing. The Word sense disambiguation (wsd) is a task of utmost importance and as such, it is not surprinsing the great interest for this area of research. We argue that the difficulty of solving the problem of sense disambiguation depends to a large extent on the application which requires a solutio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Studies in health technology and informatics
دوره 84 Pt 1 شماره
صفحات -
تاریخ انتشار 2001